Replace kl_penalty_reference_step with kl_penalty_step_lag by angkywilliam · Pull Request #625 · OpenPipe/ART

angkywilliam · 2026-03-20T04:34:37Z

Summary

Rename kl_penalty_reference_step to kl_penalty_step_lag in PipelineTrainer
None (default): uses step 0 as KL reference (anchor to initial model)
>= 1: uses max(0, current_step - lag) as reference (rolling anchor)
Add validation that kl_penalty_step_lag must be >= 1 if specified

Test plan

Unit tests pass (test_pipeline_trainer_local_backend.py)
New tests for lag computation added
Integration tests (requires 2 GPUs)

🤖 Generated with Claude Code

…ipelineTrainer - Rename parameter from `kl_penalty_reference_step` to `kl_penalty_step_lag` - `None` (default): uses step 0 as KL reference (anchor to initial model) - `>= 1`: uses `max(0, current_step - lag)` as reference (rolling anchor) - Add validation that kl_penalty_step_lag must be >= 1 if specified - Update existing tests and add new tests for lag computation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add tinker variant of KL-penalized advantage training script and align model naming conventions (backend-random-coef) across both. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vivekkalyan and others added 12 commits March 18, 2026 13:49

feat: Support PipelineTrainer with dedicated LocalBackend

6a67282

test: Add dedicated LocalBackend smoke coverage

3067cc1

docs: Document dedicated LocalBackend pipeline support

e4c6b2b

refactor: simplify LocalBackend pipeline trainer integration

47579de

refactor: Narrow LocalBackend train types

dbefea6

test: Add max batch size regression coverage

05f39ef

fix: Respect max batch size in PipelineTrainer

e748071

feat: Add sampled KL support to pipeline backends

9775490

refactor: Validate TinkerNative KL source before state lookup

e113b3b

test: Add PipelineTrainer KL smoke coverage

79bf8d5

fix: Preserve sampled KL metric in TinkerNativeBackend

5ef9eab

angkywilliam requested a review from vivekkalyan March 20, 2026 04:49

vivekkalyan force-pushed the feat/pipeline-kl branch from 5ef9eab to aa3e6f7 Compare March 21, 2026 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace kl_penalty_reference_step with kl_penalty_step_lag#625

Replace kl_penalty_reference_step with kl_penalty_step_lag#625
angkywilliam wants to merge 13 commits intofeat/pipeline-klfrom
feat/pipeline-kl-step-lag

angkywilliam commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

angkywilliam commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

angkywilliam commented Mar 20, 2026 •

edited

Loading